Handling of Numeric Ranges with the Subdue System

نویسندگان

  • Oscar E. Romero
  • Jesus A. Gonzalez
  • Lawrence B. Holder
چکیده

Graph-based knowledge discovery has become a powerful tool in the machine learning and data mining areas. It provides a flexible and natural data representation to describe real world domains. In this research work we present a novel algorithm for graph-based approaches to deal with numerical attributes during the data processing phase implemented in the Subdue system. Our experimental results show that the use of numerical attributes increased classification accuracy in the Mutagenesis and PTC domains in 22% compared to the Subdue system when it does not use our numerical attributes handling approach. Our method also outperforms other author’s results for the same domains, around 7% for the Mutagenesis domain and around 17% for the PTC domain. Recently, artificial intelligence techniques have become tools to solve real world problems and the need to represent real world domains has increased. Many interesting real world domains have an inherently structured character for which we need to find data representations to effectively apply AI algorithms to them. Data domains maybe classified as flat, sequential, and structural according to their properties. Some of these domains contain important numeric attributes. Domains with continuous values are not appropriately manipulated by graph-based knowledge discovery systems, although they can be appropriately represented. To the best of our knowledge at the time of publishing this work a graph-based knowledge discovery algorithm that deals with continuous values attributes does not exist. A solution proposed in the literature to approach this problem is the use of discretization techniques as a pre-processing or post-processing step but not during the knowledge discovery phase. Adding this capacity to graph-based algorithms will allow us to improve the work with numeric attributes; in this way we will be able to improve the classification accuracy for the classification task and the patterns descriptive power.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new Approach for Handling Numeric Ranges for Graph-Based Knowledge Discovery

Discovering interesting patterns from structural domains is an important task in many real world domains. In recent years, graph-based approaches have demonstrated to be a straight forward tool to mine structural data. However, not all graph-based knowledge discovery algorithms deal with numerical attributes in the same way. Some of the algorithms discard the numeric attributes during the prepr...

متن کامل

Handling of Numeric Ranges for Graph-Based Knowledge Discovery

Nowadays, graph-based knowledge discovery algorithms do not consider numeric attributes (they are discarded in the preprocessing step, or they are treated as alphanumeric values with an exact matching criterion), with the limitation to work with domains that do not have this type of attribute or finding patterns without numeric attributes. In this work, we propose a new approach for the numeric...

متن کامل

Finding the Most Descriptive Substructures in Graphs with Numeric Labels

Many graph datasets are labelled with numeric attributes. Frequent substructure discovery algorithms usually ignore these attributes; in this paper we show that they can be used to improve discrimination and search performance. Our thesis is that the most descriptive substructures are those which are normative both in terms of their structure and in terms of their numeric values. We propose an ...

متن کامل

Reliability Measures Improvement and Sensitivity Analysis of a Coal Handling Unit for Thermal Power Plant

The present paper investigates the reliability and sensitivity analysis of a coal handling unit of a thermal power plant using a probabilistic approach. Coal handling unit is the main block of a thermal power plant and it is necessary for a good function of a power plant that its power supply, which is dealt in coal handling unit, must function continuously without any obstacle. The configurati...

متن کامل

A Comparative Analysis of TLCD-Equipped Shear Buildings under Dynamic Loads

This study targets the behavior of shear buildings equipped with tuned liquid column dampers (TLCD) which attenuate dynamic load-induced vibrations. TLCDs are a passive damping system used in tall buildings. This kind of damper has proven to be very efficient, being an excellent alternative to mass dampers. A dynamic analysis of the structure-damper system was made using the software DynaPy, de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011